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Abstract 

In this paper we present a framework to analyze the asymptotic behav¬ 
ior of two timescale stochastic approximation algorithms including those 
with set-valued mean fields. This paper builds on the works of Borkar 
and Perkins & Leslie. The framework presented herein is more general 
as compared to the synchronous two timescale framework of Perkins & 
Leslie, however the assumptions involved are easily verifiable. As an ap¬ 
plication, we use this framework to analyze the two timescale stochastic 
approximation algorithm corresponding to the Lagrangian dual problem 
in optimization theory. 


1 Introduction 

The classical dynamical systems approach was developed by Benai’m mm and 
Benaim and Hirsch [3] . They showed that the asymptotic behavior of a stochas¬ 
tic approximation algorithm (SA) can be studied by analyzing the asymptotics 
of the associated ordinary differential equation (o.d.e.). This method is popu¬ 
larly known as the o.d.e. method and was originally introduced by Ljung |12) . 
In 2005, Benaim, Hofbauer and Sorin extended the dynamical systems ap¬ 
proach to include the situation where the stochastic approximation algorithm 
tracks a solution to the associated differential inclusion. Such algorithms are 
called stochastic recursive inclusions. For a detailed exposition on SA, the 
reader is referred to books by Borkar [8] and Kushner and Yin m 

There are many applications where the aforementioned paradigms are inade¬ 
quate. For example, the right hand side of a SA may require further averaging 
or an additional recursion to evaluate it. An instance mentioned in Borkar [7] 
is the ‘adaptive heuristic critic’ approach to reinforcement learning m that 
requires a stationary value iteration executed between two policy iterations. To 
solve such problems, Borkar [7] analyzed the two timescale SA algorithms. The 
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two timescale paradigm presented in Borkar [7] is inadequate if the coupled it¬ 
erates are stochastic recursive inclusions. Such iterates arise naturally in many 
learning algorithms, see for instance Section 5 of m- For another application 
from convex optimization the reader is referred to Section 0] of this paper. Such 
iterates also arise in applications that involve projections onto non-convex sets. 
The first attempt at tackling this problem was made by Perkins and Leslie [13] 
in 2012. They extended the two timescale scheme of Borkar [7] to include the 
situation when the two iterates track solutions to differential inclusions. 

Consider the following coupled recursion: 


Xn-\-i — Xn T nirCj -t- 
Vn+l =yn+ b{n) [vn + , 

where Un & h{xn,yn), Vn & g{xn,yn), h : {subsets of and g : 

^d+k {subsets of R^}. Such iterates were analyzed in m- Further, as an 
application a Markov decision process (MDP) based actor critic type learning 
algorithm was also presented in Ca¬ 
in this paper we generalize the synchronous two timescale stochastic approx¬ 
imation scheme presented in m- We present sufficient conditions that are mild 
and easily verifiable. For a complete list of assumptions used herein, the reader 
is referred to Sect ion and for the analyses under these conditions the reader is 

referred to Section [3| It is worth noting that the analysis of the faster timescale 
proceeds in a predictable manner, however, the analysis of the slower timescale 
presented herein is new to the literature to the best of our knowledge. 

In convex optimization, one is interested in minimizing an objective function 
(that is convex) subject to a few constraints. A solution to this optimization 
problem is a set of vectors that minimize our objective function. Often this set 
is referred to as a minimum set. In Section [d] we analyze the two timescale 
SA algorithm corresponding to the Lagrangian dual of a primal problem. As 
we shall see later, this analysis considers a family of minimum sets and as a 
consequence of our framework these minimum sets are no longer required to be 
singleton. In [5], Dantzig, Folkman and Shapiro presented sufficient conditions 
for the continuity of minimum sets of continuous functions. We shall use results 
from that paper to show that under some standard convexity conditions the 
assumptions of Section [321 are satisfied. We then conclude from our main result. 
Theorem 131 that the two timescale algorithm in question converges to a solution 
to the dual problem. 

2 Preliminaries and assumptions 

2.1 Definitions and notations 

The definitions and notations used in this paper are similar to those in Benai'm 
et. al. |S], Aubin et. al. [T] and Borkar [Sj. We present a few for easy reference. 
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Let H be an upper semi-continuous, set-valued map on where for any 
X G H{x) is compact and convex valued. Note that we say that H is upper 
semi-continuous when Xn ^ x, yn ^ y and ?/„ G H(xn) Vn implies y G H(x). 
Consider the differential inclusion (DI) 

X G H{x). (2) 

We say that x G X) if ^ absolutely continuous map that satisfies ([2|). 

The set-valued semiflow $ associated with ([5]) is defined on [0, -|-cx)) x R'^ as: 
$t(x) = {x(t) I X G ^(0) = ^}- Lsf T X M C [0, -l-oo) X R^ and define 

$r(M) = U $t(x). 
teT, xgm 


M C is invariant if for every x & M there exists a complete trajectory 
in M, say x G X) with x(0) = x. 

Let a; G R'^ and A C R'^, then d{x,A) := inf{||a — y|| | y G A}. We define 
the 6-open neighborhood of A by N^{A) := {a; | d{x,A) < d}. The S-closed 
neighborhood of A is defined by N^{A) := {x \ d{x,A) < <5}. 

Let M C R'^, the w — limit set be given by oj^{M) := nt>o .+oo)(M). 
Similarly the limit set of a solution x is given by L{x) = nt>o +oo)). 

A C R'^ is an attractor if it is compact, invariant and there exists a neighbor¬ 
hood U such that for any e > 0, 3 T(e) > 0 such that $[T(e) ,+o.)(C/) C N^{A). 
Such a 17 is called the fundamental neighborhood of A. The basin of attraction 
of A is given by B(A) = {x \ uj^{x) C A}. If B(A) = R"^, then the set is called 
a globally attracting set. It is called Lyapunov stable if for all <5 > 0, 3 e > 0 
such that <l>[o,+oo)(-^'^(^)) C N^{A). 

A set-valued map h : R*^ —{subsets of R™} is called a Marchaud map if it 
satisfies the following properties: 


(i) 

(ii) 


For each z G R", h{z) is convex and compact. 
(point-wise boundedness) For each z G ' 
some AT > 0. 


sup llicll < AT (1 3- ||z||) for 

w^h{z) 


(iii) h is an upper semi-continuous map. 

The open ball of radius r around 0 is represented by Br{0), while the closed ball 
is represented by Br{0). 


2.2 Assumptions 

Recall that we have the following coupled recursion: 

Xn+l =Xn+ a{n) [un + , 

Vn-i-l =yn+ b{n) [vn + , 

where Un G h{xn,yn), Vn G g{xn,yn), h : R‘^+^ ^ [subsets of R'^} and g : 
[subsets of R^}. 

We list below our assumptions. 

(Al) h and g are Marchaud maps. 
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(A2) {a(n)}„>o and {b{n)}n>o are two scalar sequences such that: 

a{n), b(n) > 0, for all n, ^ io,(n) + b(n)) = oo, ^ + b{n)^) < oo 

n>0 n>0 

and lim„_).oo = 0. Without loss of generality, we let sup„ a{n), sup„ b{n) < 

1 . 

(A3) {M*} n>i, i = 1,2, are square integrable martingale difference sequences 
with respect to the filtration := a {xm,yTn, M^,M^ : m <n),n>0, 

such that E[\\M^_^_^\\'^\Tn] < K (l + (||a::„|| + ||?/n||)^), f = 1,2, for some 
constant K > 0. Without loss of generality assume that the same con¬ 
stant, K, works for both (Al) (in the property {ii) of Marchaud maps, see 
section 12.1|) and (A3). 

(A4) sup„ {||a:n|| -f WvnW} < oo a.s. 

(A5) For each y € the differential inclusion x{t) € h{x{t),y) has a globally 
attracting set. Ay, that is also Lyapunov stable. Further, sup ||a:|| < 

X^Ay 

-A(l+||y||)- The set-valued map A : {subsets of where 

X{y) = Ay, is upper semi-continuous. 

Define for each y S a function G{y) := co IJ g{x,y) ]. The convex 

\xeX{y) J 

closure of a set A C denoted by co(A), is closure of the convex hull of A, 
i.e., the closure of the smallest convex set containing A. It will be shown later 
that G is a Marchaud map. 

(A6) y{t) £ G{y{t)) has a globally attracting set, Aq, that is also Lyapunov 
stable. 

With respect to the faster timescale, the slower timescale iterates appear sta¬ 
tionary, hence the faster timescale iterates track a solution to x(t) € h(x(t), j/o), 
where yo is fixed (see Theorem [T]). The y iterates track a solution to y{t) £ 
G{y{t)) (see Theorem [5]). It is worth noting that Theorems [1] & [5] only require 
(Al) — (A5) to hold. Since G(-) is the convex closure of a union of compact 
convex sets one can expect the set-valued map to be point-wise bounded and 
convex. However, it is unclear why it should be upper semi-continuous (hence 
Marchaud). In lemma [2] we prove that G is indeed Marchaud without any 
additional assumptions. 

Over the course of this paper we shall see that (A5) is the key assumption that 
links the asymptotic behaviors of the faster and slower timescale iterates. It may 
be noted that (A5) is weaker than the corresponding assumption - (B6)/{B6y 

used in [T3]. For example, (Bd)' requires that X{y) and U g(x,y) be convex 

xeMv) 

for every y £ while (B6) requires that X(y) be singleton for every y £ 

The reader is referred to [13] for more details. Note that A(y) being a singleton 
is a strong requirement in itself since it is the global attractor of some DI. It 
is observed in most applications that both X(y) and U g(x, y) will not be 

xG\(y) 

convex and therefore {B6)/(BG)' are easily violated. Further, our application 
discussed in Section 0] illustrates the same. 
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3 Proof of convergence 

Before we start analyzing the coupled recursion given by o, we prove a bunch 
of auxiliary results. 

Lemma 1. Consider the differential inclusion x(t) G H{x(t)), where H : R" —> 
R" is a Marchaud map. Let A be the associated globally attracting set that is 
also Lyapunov stable. Then A is an attractor and every compact set containing 
A is a fundamental neighborhood. 

Proof. Since A is compact and invariant, it is left to prove the following: given 
a compact set K C R” such that A C K; for each e > 0 there exists T(e) > 0 
such that ^t(LC) C N'^(A) for all t > T(e). 

Since A is Lyapunov stable, corresponding to N'^(A) there exists N^{A), 
where^ > 0, such that $[o_+oo)(-^'^(^)) ^ N^{A). Fixxo S K. Since A is a glob¬ 
ally attracting set, 3t{xo) > 0 such that $t(xo)(2;o) ^ N^/'^{A). Further, from 
the upper semi-continuity of flow it follows that C 

for all X € where 5{xq) > 0, see Chapter 2 of Aubin and Cel- 

lina [T]. Hence we get C N^{A). Further since A is Lyapunov stable, 

we get $(t(ao),+oo](a:) C N‘^{A). In this manner for each x G K we calculate 
t{x) and (5 (a:), the collection {N^^^\x) : x € K} is an open cover for K. Since 
K is compact, there exists a finite sub-cover | 1 < i < m}. For 

T(e) := max{t{xi) | 1 < * < m}, we have (AT) C N'^{A). 

□ 

In Theorem [2] we prove that the slower timescale trajectory asymptotically 
tracks a solution to y{t) G G{y{t)). The following lemma ensures that the 
aforementioned DL has at least one solution. 

Lemma 2. The map G referred to in (A6) is a Marchaud map. 

Proof. Fix an arbitrary ?/ G R^. For any x G X{y), it follows from (Al) that 

sup ||z|| < A:(1 + ||x|| + II 2 /II). 

^eg(x,y) 

From assumption (A5), we have that ||a:|| < Ar(H- ||y||). Substituting in the 
above equation we may conclude the following: 


sup llzll <K{1 + K{1+ ||y||) -H \\y\\) = K{K + 1)(1 -h ||y||) , 

zeg(x,y) 

sup ||z|| < A:(A:-f ||y||) , 

U gi^w) 

xf^My) 

sup ll^ll < K{K + 1)(1 -H llyll). 
zGG(y) 

We have thus proven that G is point-wise bounded. From the definition of G, 
it follows that G{y) is convex and compact. 

It remains to show that G is an upper semi-continuous map. Let Zn ^ z and 
2 /ra —S’ 2 / in R^ with G G{yn), V n > I. We need to show that z G G{y). We 
present a proof by contradiction. Since G{y) is convex and compact, z ^ G{y) 
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implies that there exists a linear functional on say /, such that sup 

w^G(y) 

f{w) < a — e and f{z) > a + e, for some a G R and e > 0. Since —>■ z, 
there exists N such that for all n > N, f{zn) > a + |. In other words, 
G{yn) n [/ > Of + |] ^ (/) for all n > Here the notation [/ > a] is used to 
denote the set {x \ f{x) > a}. 

For the sake of convenience, we denote the set U 9 {x,y) hy B{y). We 

xGXiy) 

claim that B{yn) fl [/ > a + |] 7 ^ for all n > N. We prove this claim later, 
for now we assume that the claim is true and proceed. Pick Wn G g{xn,yn) H 
[/ > ct + §]; where Xn G A(j/„) and n > N . It can be shown that {Xn}n>N 
and {wra}n>Af are norm bounded sequences and hence contain convergent sub¬ 
sequences. Construct sub-sequences, {wn(k)}k>i Q {w„}„>Ar and {xn{^k)}k>i C 
{xn}n>N such that lim = w and lim = x. It follows from the 

k^co ^ ^ k^oo ^ 

upper semi-continuity of g that w G g{x^ y) and from the upper semi-continuity 
of A that X G A(j/), hence w G G{y). Since / is continuous, f{w) > a This 
is a contradiction. 

It remains to prove that B{yn) H [/ > a -I- |] 7 ^ ^ for all n > N. Suppose 
this were false, then 3{m{k)}k>i C {n > N} such that B{ym{k)) C [/ < a -I- |] 
for each fc > 1. It can be shown that cd(i?(?/m(fc))) C [f < a + for each fc > 1. 
Since Zrn(fc) —^ such that for all m{k) > Ni, f{zm{k)) > a -f This is 

a contradiction. Hence we get B(xn) H [/ > a -I- |] 7 ^ </> for all n > N. □ 

It is worth noting that (>15) is a key requirement in the above proof. In the 
next lemma, we show the convergence of the martingale noise terms. 

Lemma 3. The sequences and {C^}, where 

convergent almost surely. 

Proof. Although a proof of the above statement can be found in [5] or [5], 
we provide one for the sake of completeness. We only prove the almost sure 
convergence of as the convergence of can be similarly shown. 

It is enough to show that 

00 

'^a{mfE[\\C^^^-Cl^f\Tm] < 00 a.s., 

m—0 
00 

i.e.,'^ a{m)'^E [\\M^_^_^\\'^\Prn] < 00 a.s. 

m—0 

From assumption (A3) it follows that 

00 00 

J 2 a{mfE[\\M^^^\\^\Em] < AT ^ 0(771)^ (l-f (||a;„||-b ||y„||) 2 ) . 

m—0 m—0 

From assumptions (A2) and (A4) it follows that 

00 

K 0 ( 771 ) ^ (I + (Ikmll + llymll)^) < 00 a.s. 

m—0 


□ 
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We now prove a couple of technical results that are essential to the proofs 
of Theorems [T] and [H 

Lemma 4. Given any yo G and e > 0, there exists 5 > 0 such that for all 
X e N^{X{yo)), we have g{x,yo) C N^{G{yo)). 

Proof. Assume the statement is not true. Then, 3 0 and Xn G N^’'(A(yo)) 

such that g(xn,yo) ^ N'^{G{yo)), n > 1. In other words, 37 „ G g{xn,yo) and 
7 n ^ N'^{G{yo)) for each n > 1 . Since {xn} and { 7 n} are bounded sequences 
there exist convergent sub-sequences, lim Xn(k) = ^ and lim ^n(k) = 7 - Since 

Xn(k) S (A(j/o)) and Sn{k) 4 0 it follows that x G X(yo) and hence g(x, yo) C 

G{yo)- We also have that v ^ N'^{G{yo)) as Vn(k) ^ tV'^(G'(yo)) for all k > 1. 
Since g is upper semi-continuous it follows that 7 G g{x,yo) and hence 7 G 
G{yo). This is a contradiction. □ 

Lemma 5. Let xq G and yo G K.* be such that the statement of lemma 
is satisfied (with xg in place of x). If lim = xg and lim = yg then 3N 

n—^oo n —^00 

such that Vn > N, g{xn,yn) G N'^{G{yo)). 

Proof. If not, 3 {n(fc)} C {n} such that lim n(k) = 00 and g{xn(k)^yn(k)) ^ 

k—¥CC) 

N’^{G{yo)). Without loss of generality assume that {n{k)} = {n}. In other 
words, 37 „ G g{xn, yn) such that 7 „ ^ N^{G{yg)) for all n > 1. Since { 7 „} is a 
bounded sequence, it has a convergent sub-sequence, i.e., lim 7 „(m) = 7 . Since 
lim Xn(m) = xg, lim yn(m) = 2/0 and g is upper semi-continuous it follows that 

7 G g{xg,yo) and finally from lemma 0] we get that 7 G N’^{G{yo)). This is a 
contradiction. □ 


Before we proceed let us construct trajectories, using o, with respect to 
the faster timescale. Define t(0) := 0, t{n) := n > 1. The linearly 

interpolated trajectory x(t), t > 0, is constructed from the sequence {xn} as 
follows: let x(t(n)) := and for t G (t(n), t(n 3-1)), let 


x{t) 


/ t{n + l) -t \ 
\t(n 3- 1) - t{n)) 


x{t{n)) 3- 


/ t-t{n) \ 
\t{n 3- 1) — t{n)) 


x{t{n + 1)). 


(3) 


We construct a piecewise constant trajectory from the sequence {m„} as follows: 
u{t) := Un for t G [t(n), t(n 3- 1)), n > 0. 


Let us construct trajectories with respect to the slower timescale in a similar 
manner. Dehne s(0) := 0, s{n) := n>l. Let y{s{n)) := and 

for s G (s(n), s(n 3-1)), let 


2/(s) 


/ s{n 3- 1) - s \ 
\s{n 3- 1) - s(n )) 


y{s{n)) 


/ s - s{n) \ 
\s{n 3- 1) - s{n)) 


y{s{n + l)). (4) 


Also v(s) := Vn for s G [s(n),s{n 3- 1)), n > 0, is the corresponding piecewise 
constant trajectory. 


For s > 0, let x‘^{t), t > 0, denote the solution to x^{t) = u{s + t) with the 
initial condition a;®(0) = x{s). Similarly, let 2/*(f): t ^ 0, denote the solution to 
= v{s + t) with the initial condition y^{0) = y(s). 
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(5) 


The y iterate in recursion © can be re-written as 


y„+i =yn + a{n) 


Kn) , 

a{n) 


Mm2 


Define e(n) := and It can be shown that the stochas¬ 
tic iteration given by yn+i = yn + satisfies the set of assumptions 

given in Benaim [5]. From (Al), {A2) and {AA) it follows that e(n) ^ 0 almost 
surely. Since e(n) 0 the recursion given by and yn+i = yn + a{n)Mlj^^ 
have the same asymptotics. For a precise statement and proof the reader is 
referred to lemma 2.1 of [7]. 

Define y{t{n)) := y„, where n > 0 and y{t) for t G (tin), t{n + 1)) by 


y{t) 


/ t{n + l)-t \ 
\t{n -I-1) — t{n)) 


y{t{n)) -K 


/ t - t{n) \ 
\t{n -I- 1) - t{n)) 


y{t{n + l)). 


( 6 ) 


The trajectory y{-) can be seen as an evolution of the y iterate with respect to 
the faster timescale, {a(n)}. 

Lemma 6. Almost surely every limit point, y{-), of {y(s-|--) | s > 0} m 
C([0, oo), as s —>■ oo satisfies y{t) = ?/(0), t > 0. 

Proof. It can be shown that yn+i = yn + a(ri)M^_^_^ satisfies the assumptions of 
Benaim [5]. Hence the corresponding linearly interpolated trajectory tracks the 
solution to y{t) = 0. The statement of the lemma then follows trivially. □ 


Lemma 7. For any T > 0, 
t) - y"(t)\\ = 0, a.s. 


lim sup 

S-S-OOjgfg y] 


||x(s+t)—x'*(t)|| = 0 and lim sup || 2 /(s+ 


Proof. In order to prove the above lemma, it enough to prove the following: 
lim sup \\x(t(n + m)) — + m) — t(n))\\ = 0 and 

t{n)^oo 0<t{n+m)-t{n)<T 

lim sup ||y(s(n + to)) — 2 /'**'"^(s(n + m) — s(n))|| = 0 a.s. 

s(n)^oo 0<s(n+m)-s(n)<T 


Note the following: 


m —1 




x(t(n + to)) = x(t(n)) + (u(t(n + k)) + , 

^t(n+m) —t(n) 

/O 


f 

m) — t{n)) = x{t{n ))/ u{t{n)z) dz^ 

Jo 

x^''"\t{n + m) — t{n)) =x{t{n)) + / 

J ti'i 


i(z) dz. 


( 7 ) 


From we get, 

\\x{t{n + to)) — {t{n + m) — t(n))|| = 

pt{n+k+l) 


^ a{n + k)u{t{n + k)) - ^ / u{z) dz F a{n + 

k=0 k=0 k=0 












The R.H.S. of the above equation equals 


as 


T,T=o + ^)^^+fc+i 

™-l rn-l .t(n+k+l) 

a{n + k)u{t{n + k)) = / u{z) dz. 

k=0 k=0 ■^d‘n-+k) 

Since Cn •= Sm=o — 1) converges a.s., the first part of claim 

follows. 

The second part, for the y iterates, can be similarly proven. □ 

From assumptions (Al) and {A4) it follows that {x^{- ) | r > 0} and {y’’(-) | r > 

0} are equicontinuous and pointwise bounded families of functions. By the 
Arzela-Ascoli theorem they are relatively compact in (^([O, oo), and (^([O, oo), R.*) 
respectively. From lemma [7]it then follows that {x{r+- ) | r > 0} and {y(r+-) | r > 

0} are also relatively compact, see ([S]) and (0]) for the definitions of x{- ) and 
y(-), respectively. 

3.1 Convergence in the faster timescale 

The following theorem and its proof are similar to Theorem 2 from Chapter 5 
of Borkar [5]. We present a proof for the sake of completeness. 

Theorem 1. Almost surely, every limit point of {x{r+-) | r > 0} m C([0, oo), R.'^) 
is of the form x{t) = x(0) + f* u{z) dz, where u is a measurable funetion sueh 
that u(t) G h{x{t),y{0)), t >0, for some fixed y{0) G R^. 

Proof. Fix T > 0, then {u{r +1) | < € [0, T]}, r > 0 can be viewed as a subset 
of L2([0,T],R'^). From (Al) and (A4) it follows that the above is uniformly 
bounded and hence weakly relatively compact. Let {r(n)} be a sequence such 
that the following hold: 

(i) lim r(n) = oo. 

n—)-oo 

(ii) There exists some x{-) G (^([O, oo), R'^) such that x(r(n)+-) — x{-) in 
(^([O, oo), R'^). This is because {x(r+-) | r > 0} is relatively compact in 
C'([0,oo),R'^). 

(iii) y{r{n)+-) —>■ y{-) in (^([O, oo), R^) for some y G (7([0, oo), R^). It follows 
from lemma [S] that y{t) = 2 /( 0 ) for all t > 0. 

(iv) u{r{n )+-) —>■ m(- ) weakly in T2([0, T], R'^). 

From lemma [71 it follows that x’'(”^(-) -)• x{-) in (^([O, oo), R'^), and we have 
that /p u{r{n) + z) dz —>• u(z) dz for t € [0,T]. Letting n —>• oo in 

= a:’'(”)(0) + [ u{r{n)+z) dz, t G [0,T], 

Jo 

we get x{t) = j;(0) + fg u(z) dz, t G [0,T]. 

Since u(r(n)+-) —/■ u(-) weakly in L2([0, T],R'^), there exists {n(fc)} C {n} 
such that n(k) f oo and 

1 ^ 

— y^M(r(n(/c))+-) ->u(-) 
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strongly in L2([0,r],IR'^). Further, there exist {-/V(m)} C {TV} such that 7V(m) f 
cx) and 

Nim) 

—— u(r(n(fc ))+-) ^u{-) (8) 

^ ^ k=l 

a.e. in [0, T], 

Define [t] := max{t{n) \ t{n) < tj. If we fix to G [Oi^] such that {HI) holds, 
then u{r{n{k)) + to) € h{x{[r{n{k)) + to]),y{[r{n{k)) + to])) for k > 1. Since 
lim ||x(r(n(fc)) +to) —x([r(n(fc)) +to])|| = 0, it follows that lim x([r(n(fc)) + 

n{k)—¥oc k—^oo 

to]) = a:(to), and similarly, we have that lim y{[r{n{k)) +to]) = j/(0). Since h is 

k—^oo 

upper semi-continuous it follows that lim d(u(r{n{k)) -|-to), Ti(a;(to),?/(0))) = 

k—¥co 

0. The set h{x{to),y{0)) is compact and convex, hence it follows from ([S]) that 
u(to) G Ti(x(to),y(0)). □ 

3.2 Convergence in the slower timescale 

Theorem 2. For any e > 0, almost surely any limit point of {y{r +-) | r > 0} 
m C([0,oo),M^) is of the form y{t) =y{0) + Jq v(z) dz, where V is a measurable 
function sueh that v{t) G N'^{G{y{t))), t > 0. 

Proof. Fix r > 0. As before let {r(n)}„>i be a sequence such that the following 
hold: 

(i) lim r(n) = oo. 

n—¥oo 

(ii) y{r{n)+- )—>■?/(■) in (^([O, oo), K^), where y{-) G C([0, oo),IR^). 

(iii) v{r{n)+-) —>-v(-) weakly in L 2 ([ 0 ,T],R^). 

Also, as before, we have the following: 

(i) There exists {n{k)} C {n} such that ^ J2k=i v{r{n{k))+-) ^ v{-) strongly 
in L2([0,T],R'^) as N ^ oo. 

(ii) There exist {A^(m)} C {A^j such that N{m) f oo and 

N{m) 

—— ^r{n{k))+■) ^ v{-) (9) 

N{m) ^ 


a.e. on [0,T]. 

Define [s]' := max{s{n) \ s{n) < s). Construct a sequence {m{n)}n>i C N 
such that s(m(n)) = [r(n) -I- to]' for each n > 1. Observe that y{t{m{n))) = 
y{s{m{n))) and v{r{n)+tt)) G g{x{t{m{n))),y(t{m{n)))). 

Choose to G (0, T) such that (H]) is satisfied. If we show that 3 N such that 
for all n > N, g{x{t{m{n))),y{t{m{n)))) C N'^{G{y{to))) then (jH]) implies that 
u(to)G7^(G(2/(to))). 

It remains to show the existence of such a N. We present a proof by con¬ 
tradiction. We may assume without loss of generality that for each n > 1, 
g{x{t{m{n))),y{t(m{n)))) ^ A^''(G(y(to))), t.e., G g{x{t(m{n))),y{t{m{n)))) 
such that 7 „ ^ N'^{G{y{to))). Let be the set on which (A4) is satisfied 
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and S 2 be the set on which lemma m bolds. Clearly P{Si C ^ 2 ) = 1- For 
each w G S'! n S 2 , 3 R{uj) < 00 such that sup„||x„(w) + yn{uj)\\ < R{uj) and 
sup„ itr(l + ||j/„(a;)||) < In what follows we merely use R and the de¬ 

pendence on Lo (sample path) is understood to be implicit. From lemma [T] it 
follows that corresponding to x{t) £ h{x{t),y{to)) and some <5 > 0 there exists 
To, possibly dependent on R, such that for all t > Tq, $t(xo) G {X{y{tQ))) 
for all xq G 

We construct a new sequence {^n)}„>i from {TO(n)}„>i such that t(l{n)) = 
min{t(m) \ \t(m(n))—t(m)\ < Tq}. Since {x(r +-) | r > 0} is relatively compact 
in cxd), K'^), it follows that x{t{l{n))+-) —)• x{-) in Tg], From 

lemma [6] we can conclude that y{t{Hn))+-) y(-) in Tg], K^), where 

y{t) = y{to) for all t G [0,To]. lemma [B] only asserts that the limiting func¬ 
tion is a constant, we recognize this constant to be y{to) since \\y{t{l{n)) -f 
Tg) —y{t{m{n)))\\ —0 and y{t{l{n)) -t-Tg) —>• 2 /(<o). Note that in the foregoing- 
discussion we can only assert the existence of convergent subsequences, again 
for the sake of convenience we assume that the sequences at hand are both 

convergent. It follows from Theorem [1] that x{t) = x(0) -I- u{z) dz, where 

u{t) G h{x{t),y{to)). Since x(0) G 5fi(0) it follows that x(Tg) G {X{y{to))). 

From lemma 11 we get g{x{To),y{to)) C ^{Giyito))). Since ||x(t(m(n))) - 
x{t{l{n)) -l-Tg)|| —7> 0 it follows that x{t{m{n))) —>■ x(Tg). It follows from lemma 
d that 37V such that for n > N, g{x{t{m{n))),y{t{m{n)))) C 7V'^(G(y(tg))). 
This is a contradiction. □ 

A direct consequence of the above theorem is that almost surely any limit 
point of {y{r +-) | r > 0} in G([0, 00 ), is of the form y{t) = y(0) -I- f* v{z) dz, 
where u is a measurable function such that v{t) G G{y{t)), t > 0. 

3.3 Main result 

Theorem 3. Under assumptions (AI) — (A6), almost surely the set of aecumu- 
lation points is given by 

|(x,y) I lim d((x,y),(x„,j/„)) = ol C (J {(x, y) | x G A(y)} . (10) 

^ yeAo 

Proof. The statement follows directly from Theorems [1] and d □ 

Note that assumption (A6) allows us to narrow the set of interest. If 
(A6) does not hold then we can only conclude that the R.H.S. of (fTUl) is 
U {(x,y) I X G A(y)}. On the other hand if (A6) holds and Ag consists of 
!/GR'= 

a single point, say yg, then the R.H.S. of cni) is {(x,yg) I X G A(yg)}. Further, 
if A(yg) is of cardinality one then the R.H.S. of dTUl) is just (A(yg), yg). 

Remark:lt may be noted that all proofs and conclusions in this paper will go 
through if (Al) is weakened to let g be upper semi-continuous and y(x, •) be 
Marchaud on for each fixed x G M'^. 
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4 Application: An SA algorithm to solve the 
Lagrangian dual problem 

Let f and g : ^ be two given functions. We want to minimize 

f{x) subject to the condition that g{x) < 0 (every component of g{x) is non¬ 
positive). This problem can be stated in the following primal form: 

inf sup (/(x) p'^g{x)). (11) 

fi>0 

Let us consider the following two timescale SA algorithm to solve the primal 

(HH): 


l^n+l — pn + a(n) [v^ ifi^n) + i^n9{^n)) + Min+l\ ) (12) 

Xn+1 — 6(?r) [Vaj i^f ip^n) “t” “t” ■ 

where, a{n),b{n) > 0, “(”■) = E„>o 

Sra>o < oo and —>• 0. Without loss of generality assume that 

sup a{n), b{n) < 1. The sequences {Mf}n>i and are suitable mar- 

n 

tingale difference noise terms. 

Suppose there exists Xq G such that ^(xo) > 0, then g = (oo,..., oo) 
maximizes /(xq) -|- pAg{xQ). With respect to the faster timescale {p) iterates 
the slower timescale (x) iterates can be viewed as being “quasi-static”, see [5] 
for more details. It then follows from the aforementioned observation that the 
/i iterates cannot be guaranteed to be stable. In other words, we cannot use 
(HID to solve the primal problem. 


If strong duality holds then solving (ED is equivalent to solving its dual given 
by: 

sup inf (/(x) -K p'^g{x)). (13) 

/ieR*’ a:GR‘^ 

M>0 

Further, the two timescale scheme to solve the dual problem is given by: 

Xn+l = Xn ~ a,{n) [Va; {f {Xn) + PndiXn)) + fl^n-l-l] ) (14) 

Pn+l = /^n + [V^ (yf(Xn) + 4“ ^n+'\\ ■ 

Note that ED is obtained by flipping the timescales of ED- Strong duality can 
be enforced if we assume the following: 

(51) /(x) = x'^Qx + b'^x + c, where Q is a positive semi-definite d x d matrix, 
& G and c G M. 

(52) g = A, where A is a k x d matrix. 

(53) / is bounded from below. 

The reader is referred to Bertsekas [B] for further details. For the purposes of 
this section we assume the following: 
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(S'!) — {S3) are satisfied. 

(A3)' X]n>0 a{n)M^_^_i < oo a.s., where i = 1, 2. 

The sole purpose of (A3) in Section 12.21 is to ensure the convergence of the 
martingale noise terms i.e., (A3)' holds. It is clear that (fHll satisfies (Al) since 
(SI) — {S3) hold while (A2) is the step size assumption that is enforced. 

The stability of the iterates in m directly follows from strong duality and 
(A3)'. The ^ iterates are “quasi-static” with respect to the x iterates. Further, 
since f{x) + ^J.Qg{x) is a convex function (from (5'1) and (<5'2)), for a fixed /xq, 
f{x) + iJ,Qg{x) achieves its minimum “inside” K.'^. Hence, the stability of the 
X iterates will follow from that of the /i iterates and (A3)'. In other words, 
(fTTl) satisfies (AI), (A2), (A3)' & (A4), see Section for the definitions of 
(Al), (A2) and (A4). 

For a fixed /xq, the minimizers of f{x)+gQg{x) constitute the global attractor 
of the o.d.e., x{t) = —'Vx{f{x) + g,Qg{x)). Our paradigm comes in handy when 
this attractor set is NOT singleton, which is generally the case. In other words, 
we can define the following set valued map: Am : —>• R'^, where Xm{fJ‘o) is the 
global attractor of x{t) = —Vx{f{x) + gQg{x)). 

Now we check that (HI satisfies (A5). To do so it is enough to ensure that 
Am is an upper semi-continuous map. Recall that Am(/x) is the minimum set 
of f{x) + fJ‘^g{x) for each fj, G Dantzig, Folkman and Shapiro [5] studied 
the continuity of minimum sets of continuous functions. A wealth of sufficient 
conditions can be found in which when satisfied by the functions guarantee 
“continuity” of the corresponding minimum sets. In our case since (S'!) — (53) 
are satisfied. Corollary 1.2.3 of [9] guarantees upper semi-continuity of Am- 

Since (A1)-(A5) are satisfied by (fTHl . it follows from Theorems [T]&[5] that: 

(I) Almost surely every limit point of {x{r+-) | r > 0} in (^([O, oo),R'^) is of 
the form x{t) = a:(0) -I- /q* V 2 ,(/(a:(t)) + fiQg{x{t))) dt for some a:(0) G R'^ 
and some /xq € R^. 

(II) Almost surely, any limit point of {/x(r+-) | r > 0} in (^([O, oo), R^) is of the 
form /x(t) = /x(0) -1- f* i'{z) dz for some measurable function v with v{t) G 
G{g.{t)), t > 0 and G{g,{t)) = co ({V^(/(a;) + g,{t)^g{x)) \ x G Am(/x(t))}). 

For the construction of ir(-) and /x(-) see equations ([3]) and dH) respectively. If in 
addition, m satisfies (A6) z.e., 3 A^ C R^ such that it is the global attractor 
of /x(t) S G{g{t)), then it follows from Theorem [3] that: almost surely any accu¬ 
mulation point of {{xn,yn) I > 0} belongs to the set A := U {{x,yL) I x G 

Am(/x)}- Ttis attractor A^ is the maximum set of H(/x) := inf {f{x) + gSg{x)) 

subject to /X > 0. It may be noted that 75 is a concave function that is bounded 
above as a consequence of strong duality. For any {x*, g*) G A we have that 

f{x*) + {g*)'^g{x*) = sup inf f{x) + g'^g{x). 

/ieR'=a:GR'^ 

n>o 
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In other words, almost surely the two timescale iterates given by (m converge 
to a solution of the dual m- It follows from strong duality that they almost 
surely converge to a solution of the primal (ITTl) . 


5 Conclusions 

In this paper we have presented a framework for the analysis of two timescale 
stochastic approximation algorithms with set valued mean fields. Our frame¬ 
work generalizes the one by Perkins and Leslie. We note that the analysis of 
the faster timescale proceeds in a predictable manner but the analysis of the 
slower timescale is new to the literature to the best of our knowledge. As an ap¬ 
plication we analyze the two timescale scheme that arises from the Lagrangian 
dual problem in optimization using our framework. Our framework is applicable 
even when the minimum sets are not singleton. 
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